1
Scaling RL: The Move to Function Approximation
AI029 Lesson 9
00:00

In the transition from tabular methods to Function Approximation, we confront the reality that for most complex environments, a lookup table is physically impossible. When the number of states $n$ reaches astronomical scales, we must represent the value function as a parameterized mapping: $v_{\pi}(s) \approx \hat{v}(s, \mathbf{w})$.

States (n) m β‰ͺ n Parameters (w) Smooth Value Surface

The Feasibility Inquiry

You might ask: "Is there any reason to think this might be possible?" Can we really represent $10^{170}$ board states with just a few million weights? The answer lies in the regularity of our world. Similar states usually have similar values. By normalizing features into a stable range like [0, 1], we allow the agent to detect patternsβ€”like "territory control" in Goβ€”that apply to billions of configurations it has never explicitly seen.

Generalization vs. Discrimination

  • Generalization: The superpower of FA. Learning about state A informs the estimate for a similar state B. This is the only way to scale.
  • Discrimination: The ability to tell two states apart. Tabular methods have perfect discrimination but zero generalization; FA trades discrimination for the ability to predict the unknown.